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Abstract: A discretization scheme for variable coefficient Helmholtz problems on two- 
dimensional domains is presented. The scheme is based on high-order spectral approx- 
imations and is designed for problems with smooth solutions. The resulting system of 
] linear equations is solved using a direct solver with 0(iV L5 ) complexity for the pre- 

y—i ■ computation and 0(N log N) complexity for the solve. The fact that the solver is direct 

is a principal feature of the scheme, since iterative methods tend to struggle with the 
Helmholtz equation. Numerical examples demonstrate that the scheme is fast and highly 
accurate. For instance, using a discretization with 12 points per wave-length, a Helmholtz 
| problem on a domain of size 100 x 100 wavelengths was solved to ten correct digits. The 

■ computation was executed on an office desktop; it involved 1.6M degrees of freedom and 

y—i , required 100 seconds for the pre-computation, and 0.3 seconds for the actual solve. 

<T^ ! 1. Introduction 

The paper describes a technique for constructing an approximate solution to the variable coefficient 
i -^h ! Helmholtz equation 

( -Au(x) - k 2 (1 - b(x))u(x) = x e n, 

'I (L1) \ u(as) = /(as) xeF, 

where $7 is a rectangular domain with boundary T, where the Helmholtz parameter n is real, and 
\Q ■ where b is a given smooth scattering potential. The scheme can straight-forwardly be adapted to 
handle other variable coefficient elliptic problems, as well as free space scattering problems in M 2 . 
The primary limitation of the method is that it requires the solution u to be smooth in 17. 

The equation (II. lj) is discretized via a composite spectral scheme. The domain O, is split into 
\ small square (or rectangular) patches. On each patch, the solution u is represented via tabulation 
£NJ ■ on a tensor product grid of Chebyshev points, see Figure [TJ The Laplace operator is approximated 
via a spectral differentiation matrix acting on each local grid, and then equation (jl.ip is enforced 
' strongly at all tabulation nodes in the interior of each patch. To glue patches together, continuity 
• of both the potential u and its normal derivative are enforced at the spectral interpolation nodes on 
j_j | the boundaries between patches. 

The discretization scheme is combined with a direct solver for the resulting linear system. The 



fact that the solver is direct rather than iterative is a principal feature of the scheme, since iterative 
solvers tend to struggle for Helmholtz problems of the kind considered here [2]. The direct solver 
organizes the patches in the discretization into a binary tree of successively larger patches. The solver 
then involves two stages, one that involves an upwards pass, and one that involves a downwards pass: 

(1) A pre-computation stage where an approximation to the solution operator for (jl.ip is com- 
puted. This is done via a single sweep of the hierarchical tree, going from smaller patches 
to larger. For each leaf in the tree, a local solution operator, and an approximation to the 
Dirichlet-to-Neumann (DtN) map for the patch are constructed. For a parent node in the 
tree, a local solution operator and a local DtN operator are computed from an equilibrium 
equation formed using the DtN operators of the children of the patch. The pre-computation 
stage has asymptotic complexity OlN 1 ' 5 ). 

(2) A solve stage that takes as input a vector of Dirichlet data tabulated on T, and constructs 
tabulated values of u at all internal grid points. The solve stage involves a single downwards 
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sweep through the hierarchical tree of patches, going from larger patches to smaller. The 
solve stage has asymptotic complexity O(NlogN). 

Numerical experiments indicate that the spectral convergence of the method makes it both highly 
accurate and computationally efficient. For instance, the equation was solved for a box whose 
size exceeded 100 x 100 wave-lengths in less than 2 minutes on a standard office laptop. A 20-th 
order spectral scheme with 12 points per wave-length was used in the local approximation on the 
patches. The resulting solution was accurate to between 7 and 10 digits, depending on the nature of 
the scattering potential b in (|1.1|) . The discretization used a total of N = 1.6 • 10 6 degrees of freedom. 
The computational time was dominated by the pre-computation stage; the actual solve stage took 
only 0.3 seconds. This makes the scheme particularly powerful in situations where an equation such 
as (|l.lj) needs to be solved for a sequence of different boundary functions /. 

The scheme proposed is conceptually related to a direct solver for the Lippman-Schwinger equation 
proposed in 2002 by Yu Chen pp. The schemes are different in that the method proposed here is not 
based on a Lippman-Schwinger formulation, and uses spectral approximations on the smallest patches 
in the hierarchical tree. Comparing the efficiencies of the two schemes is difficult since the paper [T] 
does not report numerical results, and we have been unable to find reports of implementations of 
the scheme. The scheme proposed here is also conceptually related to the classical nested dissection 
algorithm for finite element and finite difference matrices [3], and to recently proposed 0(iV L5 ) direct 
solvers for BIEs on surfaces in 3D [3]. 

For clarity, the current paper focusses on the simple boundary value problem (jl.ip involving the 
Helmholtz elliptic operator and Dirichlet boundary data. The scheme can with trivial modifications 
be applied to more general elliptic operators 

- c n (x)[dfu](x) - 2c 12 (x)[d 1 d 2 u](x) - c 22 (x) [d$u] (x) 

+ a(x)[diu](x) + C2(x)[d2u](x) + c(x) u(x) = 0, 

coupled with Dirichlet, Neumann, or mixed boundary data. It has for instance been successfully 
tested on convection-diffusion problems that are strongly dominated (by a factor of 10 4 ) by the 
convection term, see Section 16.41 Moreover, the scheme can with minor modifications be applied to 
a free space scattering problem such as 

(1.2) - Au(aj) - k 2 (1 - b(x))u(x) = f(x), xeM 2 , 

coupled with appropriate radiation conditions at infinity. A standard assumption is that / is sup- 
ported outside of some (bounded) square region O while the smooth function b is supported inside 
f2. The scheme described in this note computes the DtN operator for (jl.2p on fi. The DtN operator 
for the exterior domain f2 c can be computed via Boundary Integral Equation techniques; and by 
combining the two, one can solve the free space scattering problem, see Section 17.31 

The method proposed has a vulnerability in that it crucially relies on the existence of DtN operators 
for all patches in the hierarchical tree. This can be problematic due to resonances: For certain wave- 
numbers k, there exist non-trivial solutions that have zero Dirichlet boundary data. We have found 
that in practice, this problem almost never arises when processing domains that are a couple of 
hundred wave-lengths or less in size. Moreover, if a resonant patch should be encountered, this will 
be detected and counter-measures can be taken, see Sections 15.31 and 17.51 

The asymptotic complexity of the proposed method is 0(iV L5 ). For the case where the wave- 
number k is increased as N grows to keep a constant number of discretization points per wave-length 
(i.e. k ~ iV - 5 ), we do not know how to improve the complexity. However, for the case where 
the wave-number is kept constant as N increases, O(N) complexity can very likely be attained by 
exploiting internal structure in the DtN operators. The resulting scheme would be a spectral version 
of recently published accelerated nested dissection schemes such as [U [JUl [12] . 

An early version of the work reported was published on arXiv as [9] . 



Figure 1. The box = [0, l] 2 is split into 4x4 leaf boxes, and a Cartesian grid of 
Chebyshev nodes is placed on each leaf box. The figure shows local grids of size 7x7 
for clarity; in actual computations, local grids of size 21 x 21 were typically used. 



The paper is organized as follows: Section [2] introduces notation and lists some classical material on 
spectral interpolation and differentiation. Section [3] describes how to compute the solution operator 
and the DtN operator for a leaf in tree (which is discretized via a single tensor-product grid of 
Chebyshev nodes). Section |4] describes how the DtN operator for a larger patch consisting of two 
small patches can be computed if the DtN operators for the smaller patches are given. Section [5] 
describes the full hierarchical scheme. Section [6] reports the results of some numerical experiments. 
Section [7] describes how the scheme can be extended to more general situations. 



2. Preliminaries — spectral differentiation 

This section introduces notation for spectral differentiation on tensor product grids of Chebyshev 
nodes on the square domain [—a, a] 2 . This material is classical, see, e.g., Trefethen |llj . (While 
we restrict attention to square boxes here, all techniques generalize trivially to rectangular boxes of 
moderate aspect ratios.) 

Let p denote a positive integer. The Chebyshev nodes on [—a, a] are the points 

U = a cos((i - 1)tt/(p - 1)), i = 1, 2, 3, . . . , p. 
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Let {xk} v k=l denote the set of points of the form (ij, tj) for 1 < i,j < p. Let V p denote the linear 
space of sums of tensor products of polynomials of degree p — 1 or less. V p has dimension p 2 . Given 
a vector u 6 MP 2 , there is a unique function such that u{x k ) = u (^) f° r 1 < k < p 2 ■ (A reason 

Chebyshev nodes are of interest is that for any fixed x € [—a, a] 2 , the map u i— > u(x) is stable.) Now 
define D, E, and L as the unique p 2 x p 2 matrices such that 

(2.1) [DuP) = [8iu](x k ), k = l, 2, 3, ...,p 2 , 

(2.2) [Eu](fc) = [d 2 u](x k ), k = l, 2, 3, ...,p 2 , 

(2.3) [LuP) = [-Au](x k ), k = 1, 2, 3, ...,p 2 . 




(a) (b) 

Figure 2. Notation for the leaf computation in Section 03 (a) A leaf before elimina- 
tion of interior (white) nodes, (b) A leaf after elimination of interior nodes. 

3. Leaf computation 

This section describes the construction of a discrete approximation to the Dirichlet-to-Neumann 
operator associated with the boundary value problem (jl.ip for a square patch O. We discretize (jl.ip 
via a spectral method on a tensor product grid of Chebyshev nodes on f2. In addition to the DtN 
operator, we also construct a solution operator to (jl.ip that maps the Dirichlet data on the nodes 
on the boundary of f2 to the value of u at all internal interpolation nodes. 
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3.1. Notation. Let f2 denote a square patch. Let {a?fc}f =1 denote the nodes in a tensor product 
grid of p x p Chebyshev nodes. Partition the index set 

{1,2,..., p 2 } = I c Uli 

in such a way that I e contains all nodes on the boundary of 0, and I\ denotes the set of interior 
nodes, see Figure EJ^a). Let u be a function that satisfies (II. ip on Q and let 

u = [u(x k )]f =1 , v = [d lU (x k )]f =1 , w = [d 2 u(x k )]f =v 

denote the vectors of samples of u and its partial derivatives. We define the short-hands 

Ui = ll(Ii), Vi = v(ii), W; = w(/i), u e = u(J e ), v e = v(i" e ), w e = w(7 e ). 

Let L, D, and E denote spectral differentiation matrices corresponding to the operators — A, d±, and 
$2 , respectively (see Section [2]) . We use the short-hand 

D iie = D(/i,/ e ) 

to denote the part of the differentiation matrix D that maps exterior nodes to interior nodes, etc. 

3.2. Equilibrium condition. The operator (11. ip is approximated via the matrix 

A = -L - K 2 diag(b), 
where b denotes the vector of pointwise values of b, 

b = [b(x k )}f =1 . 

The equation we enforce on Q is that the vector A u should evaluate to zero at all internal nodes, 

(3.1) A M Ui + A ie u c = 0, 

where 

A M = A(J i> / i ) ) Ai,e = A(Ji,/e). 
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Figure 3. Notation for the merge operation described in Section |H The rectangular 
domain fi is formed by two squares £l a and Qp. The sets I±, I2, and ^3 form the 
exterior nodes (black), while I4 consists of the interior nodes (white). 

Solving (|3.ip for u;, we obtain 

(3.2) u i = Uu c , 
where 

(3.3) U = -(A M )- 1 A i , e . 

3.3. Constructing the DtN operator. Let V and W denote the matrices that map boundary 
values of the potential to boundary values of d\u and d^u. These are constructed as follows: Given 
the potential u c on the boundary, we reconstruct the potential U; in the interior via (|3.2p . Then, 
since the potential is known on all Chebyshev nodes in fi, we can determine the gradient on the 
boundary {v c , w c } via spectral differentiation on the entire domain. To formalize, we find 

V e = De e U e + D e ,i Uj = D e , e U e + D e ,i U U e = V U c , 

where 

(3.4) V = D e ,e + D c ,i U. 

An analogous computation for w c yields 

(3.5) W = E e ,e + E eii U. 



4. Merge operation 

Let Q denote a rectangular domain consisting of the union of the two smaller rectangular domains, 

f2 = n, a u £i/3, 

as shown in Figure Moreover, suppose that approximations to the DtN operators for O a and £lp 
are available. (Represented as matrices that map boundary values of u to boundary values of d\u 
and c?2tt.) This section describes how to compute a solution operator U that maps the value of a 
function u that is tabulated on the boundary of Q to the values of u on interpolation nodes on the 
internal boundary, as well as operators V and W that map boundary values of u on the boundary of 
f2 to values of the d\u and &2U tabulated on the boundary. 
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4.1. Notation. Let f2 denote a box with children Q, a and Q,p. For concreteness, let us assume that 
Vt a and Q.p share a vertical edge. We partition the points on d£l a and ddp into four sets: 

I\ Boundary nodes of Q. a that are not boundary nodes of Qp. 

I<i Boundary nodes of £lp that are not boundary nodes of £l a . 

13 The two nodes that are boundary nodes of Q a , of Qp, and also of the union box £1. 

14 Boundary nodes of both Q a and Hp that are not boundary nodes of the union box Q. 

Figure illustrates the definitions of the iVs. Let u denote a function such that 

— Au(x) — k 2 b(x) u(x) = 0, x € £1, 

and let Uj, Wj denote the values of u, d\u, and &2U, restricted to the nodes in the set "j". 
Moreover, set 



(4.1) 



114, 



and 



U3 



Finally, let M a , W", V' 3 , W' 3 denote the operators that map potential values on the boundary to 
values of d\u and d<iu on the boundary for the boxes f2 Q and Qp. We partition these matrices 
according to the numbering of nodes in Figure O 
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4.2. Equilibrium condition. Suppose that we are given a tabulation of boundary values of a 
function u that satisfies (jl.ip on f2. In other words, we are given the vectors u 4 , 112, and 113. We can 
then reconstruct the values of the potential on the interior boundary (tabulated in the vector 114) 
by using information in (|4.2p and (|4.3p . Simply observe that there are two equations specifying the 
normal derivative across the internal boundary (tabulated in v 4 ), and combine these equations: 

V^Ui + Vf )3 U3 + V? j4 U 4 = Vf )2 U 2 + VJ3U3 + Vf )4 U4. 

Solving for u 4 we get 

(4-4) u 4 = (Vf >4 - M{ A )-\M{ 2 U2 + Vf j3 u 3 - V^m - V2 i3 u 3 ). 

Now set 

(4-5) U = ( M% 4 - Vj 4 ) - 1 [-Ml, I Vj 2 I V J i3 - V 4 « 3 ] , 

to find that ()4.4p is (in view of (|4.ip ) precisely the desired formula 
(4.6) u i = Uu c . 

The net effect of the merge operation is to eliminate the interior tabulation nodes in £l a and £lp so 
that only boundary nodes in the union box Q are kept, as illustrated in Figure IH 



(a) 



(b) 



Figure 4. Merge operation for two small boxes to form a new large box. (a) Before 
elimination of interior (white) nodes, (b) After elimination of interior nodes. 



4.3. Constructing the DtN operators for the union box. We will next build a matrix V that 
constructs the derivative d\u on d£l given values of u on dVL. In other words 



Vl 






V2 


= V 




. V 3 . 




. U 3 . 



To this end, observe from (|4.2p and (|4.3p that 

(4.7) Vl = V? :1 Ul + Vf j3 u 3 + V? >4 u 4 = Vfr Ul + V? >3 u 3 + V£ 4 U u c 

(4.8) v 2 = Vf )2 u 2 + V£ 3 u 3 + V" 4 u 4 = Vf 3 u 2 + V- 3 u 3 + V" 4 U u c . 



Equations (14. 2j) and (14. 3p provide two different formulas for v 3 , either of which could be used. For 
numerical stability, we use the average of the two: 



(4.9) 
(4.10) 

Combining (|4,7p 
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In other words, 
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An analogous computation for w c yields 
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Figure 5. The square domain f2 is split into 4x4 leaf boxes. These are then gathered 
into a binary tree of successively larger boxes as described in Section f5. H One possible 
enumeration of the boxes in the tree is shown, but note that the only restriction is 
that if box r is the parent of box a, then r < a. 



5. The full hierarchical scheme 

5.1. The algorithm. Now that we know how to construct the DtN operator for a leaf (Section [3j), 
and how to merge the DtN operators of two neighboring patches to form the DtN operator of their 
union (Section we are ready to describe the full hierarchical scheme for solving (jl.ip . 

First we partition the domain Q into a collection of square (or possibly rectangular) boxes, called 
leaf boxes. These should be small enough that a small spectral mesh with pxp nodes (for, say, p = 20) 
accurately interpolates both any potential solution u of and its partial derivatives diu, and 
—Ait. Let {cCfcjfcLi denote the points in this mesh. (Observe that nodes on internal boundaries are 
shared between two or four local meshes.) Next construct a binary tree on the collection of boxes by 
hierarchically merging them, making sure that all boxes on the same level are roughly of the same 
size, cf. Figure [H The boxes should be ordered so that if r is a parent of a box a, then r < a. We 
also assume that the root of the tree (i.e. the full box fi) has index r = 1. 

With each box r, we define two index vectors I[ and as follows: 

ij A list of all indices of the nodes on the boundary of r. 

If For a leaf r, I[ is a list of all interior nodes of r. 

For a parent r, I[ is a list of all its interior nodes that are not interior nodes of its children. 
Next we execute a pre-computation in which for every box r, we construct the following matrices: 
U r The matrix that maps the values of u on the boundary of a box to the values of u 
on the interior nodes of the box. In other words, u(I[) = U T u(JJ). 

V r The matrix that maps the values of u on the boundary of a box to the values of v 
(tabulating dujdx\) on the boundary of a box. In other words, v(7J) = V T u(ij). 

W r The matrix that maps the values of u on the boundary of a box to the values of w 
(tabulating du/dx^) on the boundary of a box. In other words, w(ij) = W T u(Ig). 
To this end, we scan all boxes in the tree, going from smaller to larger. For any leaf box r, a dense 
matrix A T of size p 2 x p 2 that locally approximates the differential operator in (jl.ip is formed. Then 
the matrices IT, V r , and W r are constructed via formulas (13.30 . (|3.4jh and (j3.5|) . For a parent box r 
with children o\ and 02, the matrices U r , V r , and W r are formed from the DtN operators encoded 
in the matrices V CTl , W CTl , V CT2 , W CT2 using the formulas (gZEL) , and (l4~T2l) . The full algorithm 

is summarized in Figure [H An illustrated cartoon of the merge process is provided in Appendix |Aj 
Once all the matrices {U r } r have been formed, it is a simple matter to construct a vector u holding 
approximations to the solution u of (jl.ip . The nodes are scanned starting with the root, and then 
proceeding down in the tree towards smaller boxes. When a box r is processed, the value of u is 
known for all nodes on its boundary (i.e. those listed in /J). The matrix U r directly maps these 







Pre-computation 
This program constructs the global Dirichlet-to-Neumann operator for (jl.ip . 
It also constructs all matrices U T required for constructing u at all interior nodes. 
It is assumed that if node r is a parent of node a, then r < a. 



for T = iVboxes, -ZVboxes — 1, iVboxes ~ 2, . . . , 1 

if (r is a leaf) 

br = [b{ Xj )\ m 

U T = -f-L i . i -« 2 diag(bn) 'ke 



V T = D ee + D ei U T 
W r = E ' + D 'i ir 



else 



Let o"i and a"2 be the children of r. 
Partition I^ 1 and I" 2 into vectors I\ 
if (<7i and 02 are side-by-side) 

else 

-l 



I2, I3, and I4 as shown in Figure El 
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Delete V 71 , W ai , V ff2 , W CT2 . 
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end for 



Figure 6. Pre-computation 



values to the values of u on the nodes in the interior of r (i.e. those listed in When all nodes 
have been processed, all entries of u have been computed. Figure [7] summarizes the solve stage. 

Remark 5.1. Every interior meshpoint belongs to the index vector Jj 7 " for precisely one node r. 
In other words (J T l\ forms a disjoint union of the interior mesh points. 

Remark 5.2. The way the algorithms are described, we compute for each node r matrices V T and 
W T that allow the computation of both the normal and the tangential derivative at any boundary 
node, given the Dirichlet data on the boundary. This is done for notational convenience only. In 
practice, any rows of V T and W T that correspond to evaluation of tangential derivatives need never 
be evaluated since tangential derivatives do not enter into consideration at all. 



Remark 5.3. The merge stage is exact when performed in exact arithmetic. The only approximation 
involved is the approximation of the solution u on a leaf by its interpolating polynomial. 

5.2. Complexity analysis. The analysis of the asymptotic cost of the algorithm in Section 15.11 
closely mimics the analysis of the classical nested dissection algorithm [TJ [3j. For simplicity, we 
analyze the simplest situation in which a square domain is divided into 4 L leaf boxes, each holding 
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Solver 

This program constructs an approximation u to the solution u of (jl.ip . 

It assumes that all matrices U r have already been constructed in a pre-computation. 

It is assumed that if node r is a parent of node a, then r < a. 



u(k) = f{x k ) for all k <E 
for r = 1, 2, 3, ... , AWes 

u(IT) = U T u(I[). 
end for 



Figure 7. Solve stage. 



a spectral cartesian mesh with p x p points. The total number of unknowns in the system is then 
roughly 4 L p 2 (to be precise, N = 4 L (p - l) 2 + 2 L+1 (p - 1) + 1). 

Cost of leaf computation: Evaluating the formulas ()3.3|) , (j3.4j) , and (|3.5p requires dense matrix algebra 
on matrices of size roughly p 2 x p 2 . Since there are about N/p 2 leaves, the total cost is 

T lca{ ~^x{p 2 ) 3 ~Np 4 . 

Cost of the merge operations: For an integer I € {0, 1,2,..., L}, we refer to "level £" as the collection 
of boxes whose side length is 2~ e times the side of the full box Q (so that £ = corresponds to the 
root and £ = L corresponds to the set of leaf boxes). To form the matrices U r , V r , and W r for a 
box on level £, we need to evaluate each of the formulas (|4.5|) . (|4.11|) . and (|4.12|) three times, with 
each computation involving matrices of size roughly 2~^iV 5 x 2~ e N ' 5 . Since there are 4^ boxes on 
level £, we find that the cost of processing level £ is 

T e ~4 e x (2^iV°- 5 ) 3 ~2- £ iV L5 . 
Adding the costs at all levels, we get 

L-l L-l 
T mcrgc ~ Yl T * ~ E 2 ^ iVl ' 5 ~ iVl ' 5 - 

Cost of solve stage: The cost of processing a non-leaf node on level I is simply the cost of a matrix- 
vector multiply involving the dense matrix U r of size 2" 

-e N o.5 x 2 - e N - 5 . For a leaf, IT is of size 

roughly p 2 x p. Therefore 

Ticaf ~-xp 2 j)+^4'x (2-% a5 J ~ Np + NL ~ iVp + TV log(iV). 

^ £=0 

5.3. Problem of resonances. The scheme presented in Section 15.11 will fail if one of the patches 
in the hierarchical partitioning is resonant in the sense that there exist non-trivial solutions to the 
Helmholtz equation that have zero Dirichlet data at the boundary of the patch. In this case, the 
Neumann data for the patch is not uniquely determined by the Dirichlet data, and the DtN operator 
cannot exist. In practice, this problem will of course arise even if we are merely close to a resonance, 
and will be detected by the discovery that the inverse matrix in the formulas f)3.3[) and (14. 5f) for the 
solution operator U T is ill-conditioned. 

It is our experience from working with domains of size one hundred wave-lengths or less that 
resonances are very rare; one almost never encounters the problem. Nevertheless, it is important to 
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monitor the conditioning of the formulas (|3.3|) and (|4,5p to ensure the accuracy of the final answer. 
Should a problem be detected, the easiest solution would be to simply start the computation over 
with a different tessellation of the domain f2. Very likely, this will resolve the problem. 

Current efforts to formulate variations of the scheme that are inherently not vulnerable to reso- 
nances are described in Section [731 

6. Numerical experiments 

This section reports the results of some numerical experiments with the method described in 
Section 15.11 The method was implemented in Matlab and the experiments executed on a Lenovo 
W510 laptop with a quad core Intel i7 Q720 processor with 1.6GHz clockspeed, and 16GB of RAM. 

The speed and memory requirements of the algorithm were investigated by solving the special 
case where b = in and the Dirichlet data is simply set to equal a known analytic solution. 

The results are reported in Section 16.11 We also report the errors incurred in the special case, but it 
should be noted that these represent only a best case estimate of the errors since the equation solved 
is particularly benign. To get a more realistic estimate of the errors in the method, we also applied 
it to three situation in which exact solutions are not known: A problem with variable coefficients in 
Section 16. 2\ a problem on an L-shaped domain in Section 16. 3[ and a convection-diffusion problem in 
Section 16.41 

6.1. Constant coefficient Helmholtz. We solved the basic Helmholtz equation 

(6.1) -Au(x) - k 2 u(x) = x G $7, 

(6.2) u(x) = f(x) xeT, 

where Q, = [0, l] 2 and T = dQ. The boundary data were in this first set of experiments chosen as the 
restriction to V of an exact solution 

(6.3) u cxact {x) = Y (k\x - x\), 

where x = (—0.2, 0.4), and where Yq is the O'th Bessel function of the second kind. This experiment 
serves two purposes. The first is to systematically measure the speed and memory requirements of 
the method described in Section 15.11 The second is to get a sense of what errors can be expected 
in a "best case" scenario with a very smooth solution. Observe however that the situation is by no 
means artificial since the smoothness of this case is exactly what one encounters when the solver is 
applied to a free space scattering problem as described in Section 17.31 

The domain f2 was discretized into n x n patches, and on each patch a p x p Cartesian mesh of 
Chebyshev nodes was placed. The total number of degrees of freedom is then 

N = (n(p - l)) 2 + 2n(p - 1) + 1. 

We tested the method for p 6 {6, 11, 21, 41}. For each fixed p, the method was executed for several 
different mesh sizes n. The wave-number k was chosen to keep a constant of 12 points per wavelength, 
or k = 2irn(p- 1)/12. 

Since the exact solution is known in this case, we computed the direct error measure 

-Spot = max recomputed ( x k ) ~~ u exact( x k)\i 
k : ajfcGSZ 

where {x k }^ =1 is the set of all mesh points. We also computed the maximum error in the gradient 
of u on the boundary as computed via the V and W operators on the root box, 

-Egrad = , max { ^computed (#fc) ~ [^l^exact] ( x k) | , ^computed ~ [^cxact] ( X k) \ } ■ 
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p 


N 


N 

1 v wave 


^inv 


^solve 


-Epot 


-Egrad 


M 


M/N 








(sec) 


(sec) 






(MB) 


(reals/DOF) 


6 


6561 


6.7 


0.28 


0.0047 


8.02105e-03 


3.06565e-01 


2.8 


56.5 


6 


25921 


13.3 


0.96 


0.0184 


1.67443e-02 


1.33562c+00 


12.7 


64.2 


6 


103041 


26.7 


4.42 


0.0677 


3.60825c-02 


5.46387c+00 


56.2 


71.5 


6 


410881 


53.3 


20.23 


0.2397 


3.39011c-02 


1.05000c+01 


246.9 


78.8 


6 


1640961 


106.7 


88.73 


0.9267 


7.48385e-01 


4.92943e+02 


1075.0 


85.9 


11 


6561 


6.7 


0.16 


0.0019 


2.67089e-05 


1.08301e-03 


2.9 


58.0 


11 


25921 


13.3 


0.68 


0.0073 


5.30924e-05 


4.34070e-03 


13.0 


65.7 


11 


103041 


26.7 


3.07 


0.0293 


1.01934e-04 


1.60067e-02 


57.4 


73.0 


11 


410881 


53.3 


14.68 


0.1107 


1.07747c-04 


3.49637c-02 


251.6 


80.2 


11 


1640961 


106.7 


68.02 


0.3714 


2.17614e-04 


1.37638c-01 


1093.7 


87.4 


21 


6561 


6.7 


0.23 


0.0011 


2.56528e-10 


1.01490e-08 


4.4 


87.1 


21 


25921 


13.3 


0.92 


0.0044 


5.24706c-10 


4.44184c-08 


18.8 


95.2 


21 


103041 


26.7 


4.68 


0.0173 


9.49460c-10 


1.56699e-07 


80.8 


102.7 


21 


410881 


53.3 


22.29 


0.0727 


1.21769e-09 


3.99051c-07 


344.9 


110.0 


21 


1640961 


106.7 


99.20 


0.2965 


1.90502e-09 


1.24859e-06 


1467.2 


117.2 


21 


6558721 


213.3 


551.32 


20.9551 


2.84554e-09 


3.74616e-06 


6218.7 


124.3 


41 


6561 


6.7 


1.50 


0.0025 


9.88931e-14 


3.46762e-12 


7.9 


157.5 


41 


25921 


13.3 


4.81 


0.0041 


1.58873e-13 


1.12883e-ll 


32.9 


166.4 


41 


103041 


26.7 


18.34 


0.0162 


3.95531c-13 


5.51141e-ll 


137.1 


174.4 


41 


410881 


53.3 


75.78 


0.0672 


3.89079c-13 


1.03546e-10 


570.2 


181.9 


41 


1640961 


106.7 


332.12 


0.2796 


1.27317e-12 


7.08201c-10 


2368.3 


189.2 



Table 1. Results from an experiment with a constant coefficient Helmholtz problem 
on a square. The boundary data were picked so that the analytic solution was known; 
as a consequence, the solution is smooth, and can be smoothly extended across the 
boundary. The wave-number was chosen to keep a constant of 12 discretization points 
per wave-length. 



Table [T] reports the following variables: 
N wave The number of wave-lengths along one side of 0. 

ii nv The time in seconds required to execute the pre-computation in Figure El 

^soivc The time in seconds required to execute the solve in Figure [TJ 

M The amount of RAM used in the pre-computation in MB. 
The table also reports the memory requirements in terms of number of the number of double precision 
reals that need to be stored per degree of freedom in the discretization. 

The high-order version of the method [p = 41) was also capable of performing a high accuracy 
solve with only six points per wave-length. The results are reported in Table (2j 

We see that increasing the spectral order is very beneficial for improving accuracy. However, the 
speed deteriorates and the memory requirements increase as p grows. Choosing p = 21 appears to 
be a good compromise. 

Remark 6.1. In the course of executing the numerical examples, the instability problem described 
in Section [5.31 was detected precisely once (for p = 16 and 12 points per wave length). 

6.2. Variable coefficient Helmholtz. We solved the equation 

(6.4) -Au(sc) - k 2 (1 - b(x)) u(x) = a: G O, 

(6.5) u(x) = f{x) x € T, 
where fl = [0, l] 2 , where V = dfl, and where 

b(x) = (sin(4-/rxi) sin(4-7ra;2)) 2 • 
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p 


N 


wave 




^■solvc 


-Spot 


-^grad 


M 


M/N 








(sec) 


(sec) 






(MB) 


(reals/DOF) 


41 


6561 


13.3 


1.30 


0.0027 


1.54407e-09 


1.78814e-07 


7.9 


157.5 


41 


25921 


26.7 


4.40 


0.0043 


1.42312e-08 


2.35695e-06 


32.9 


166.4 


41 


103041 


53.3 


17.54 


0.0199 


1.73682c-08 


5.84193c-06 


137.1 


174.4 


41 


410881 


106.7 


72.90 


0.0717 


2.28475c-08 


1.51575e-05 


570.2 


181.9 


41 


1640961 


213.3 


307.37 


0.3033 


4.12809c-08 


5.51276c-05 


2368.3 


189.2 



twice as fast (so that we keep only 6 points per wave-length). 



The Helmholtz parameter was kept fixed at ft = 80, corresponding to a domain size of 12.7 x 12.7 
wave lengths. The boundary data was given by 

f(x) = cos(8xi) (1 - 2x 2 ). 

The equation (|6.4p was discretized and solved as described in Section 16.11 The speed and memory 
requirements for this computation are exactly the same as for the example in Section 16.11 (they do 
not depend on what equation is being solved), so we now focus on the accuracy of the method. We 
do not know of an exact solution, and therefore report pointwise convergence. Letting un denote 
the value of u computed using N degrees of freedom, we used 

E 1 ^ = u N (x) - u AN {x) 

as an estimate for the pointwise error in u at the point x = (0.75, 0.25). We analogously estimated 
convergence of the normal derivative at the point y = (0.75, 0.00) by measuring 

E% nd = w N (y)-w iN (y). 

The results are reported in Table [3l Table [5] reports the results from an analogous experiment, but 
now for a domain of size 102 x 102 wave-lengths. 

We observe that accuracy is almost as good as for the constant coefficient case. Ten digits of 
accuracy is easily attained, but getting more than that seems challenging; increasing N further leads 
to no improvement in accuracy. The method appears to be stable in the sense that nothing bad 
happens when N is either too large or too small. 

6.3. L-shaped domain. We solved the equation 

(6.6) -Au{x) - k 2 u{x) = x G O, 

(6.7) u(x) = f(x) xeT, 
where is the L-shaped domain 

n = [0,2] 2 \[1,2] 2 , 

and where the Helmholtz parameter k is held fixed at re = 40, making the domain 12.7 x 12.7 
wave-lengths large. The pointwise errors were estimated at the points 

£ = (0.75,0.75), and y = (1.25, 1.00), 

via 

Effi = un(x) — v,4n(x), and E^ d = wn(x) — W4n(x). 
The results are given in Table 

We observe that the errors are in this case significantly larger than they were for square domains. 
This is presumably due to the fact that the solution u is singular near the re-entrant corner in the 
L-shaped domain. (When boundary conditions corresponding to an exact solution like (I6.3P are 
imposed, the method is just as accurate as it is for the square domain.) Nevertheless, the method 
easily attains solutions with between four and five correct digits. 
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p 


N 


pts per wave 


u N (x) 


pint 
h N 


wn(v) 


ZT'bnd 
N 


6 
6 
6 
6 
6 
6 


6561 
25921 
103041 
410881 
1640961 
6558721 


6.28 
12.57 
25.13 
50.27 
100.53 
201.06 


-2.505791196753718 
-2.260084219562163 
-2.427668162910011 
-2.445455646843485 
-2.446688310709834 
-2.446767218259172 


-2.457e-01 
1.676e-01 
1.779e-02 
1.233c-03 
7.891c-05 


-661.0588680825 
7926.8096554095 
-73484.9989261573 
-34547.6403539568 
-33313.0000081604 
-33236.7252190062 


-8.588e+03 
8.141e+04 
-3.894c+04 
-1.235e+03 
-7.627c+01 


11 
11 
11 
11 
11 


6561 
25921 
103041 
410881 
1640961 


6.28 
12.57 
25.13 
50.27 
100.53 


-2.500353149793093 
-2.446599788642489 
-2.446772604281610 
-2.446772509631734 
-2.446772509994819 


-5.375c-02 
1.728e-04 

-9.465e-08 
3.631c-10 


-27023.0713474340 
-33547.3621639994 
-33232.0940315585 
-33231.6186528531 
-33231.6179197169 


6.524e+03 
-3.153c+02 
-4.754e-01 
-7.331e-04 


21 
21 
21 
21 
21 


6561 
25921 
103041 
410881 
1640961 


6.28 
12.57 
25.13 
50.27 
100.53 


-2.448236804078803 
-2.446772430608166 
-2.446772510369452 
-2.446772510428384 
-2.446772510724068 


-1.464e-03 
7.976e-08 
5.893e-ll 
2.957c-10 


-32991.4583727724 
-33231.6118304666 
-33231.6178142514 
-33231.6178087887 
-33231.6177808723 


2.402e+02 
5.984e-03 
-5.463e-06 
-2.792e-05 


41 
41 
41 
41 
41 


6561 
25921 
103041 
410881 
1640961 


6.28 
12.57 
25.13 
50.27 
100.53 


-2.446803898373796 
-2.446772510320572 
-2.446772510443995 
-2.446772510472872 
-2.446772510550181 


-3.139e-05 
1.234e-10 
2.888e-ll 
7.731e-ll 


-33233.0037457220 
-33231.6179029824 
-33231.6178135860 
-33231.6178008533 
-33231.6177541722 


-1.386e+00 
-8.940e-05 
-1.273e-05 
-4.668e-05 


Table 3. 
12.7 x 12.7 


Results from a variable coefficient Helmholtz problem on a domain of size 
wave-lengths. 


P 


TV 


pts per wave 


u N (x) 


Clint 
N 


wn(v) 


ZT'bnd 
N 


21 
21 
21 
21 
21 
21 


6561 
25921 
103041 
410881 
1640961 
6558721 


0.79 
1.57 
3.14 
6.28 
12.57 
25.13 


0.007680026148649 
0.003595286353011 
-1.611350573683137 
-3.063762877533994 
-3.068320356836451 
-3.068320286093162 


4.085e-03 
1.615c+00 
1.452c+00 

4.557c-03 
-7.074e-08 


3828.84075823538 
-2829.88055527014 
-2650.80640712917 
3299.72573600854 
3307.49786015114 
3307.49776422768 


6.659c+03 
-1.791c+02 
-5.951C+03 
-7.772C+00 
9.592c-05 


41 
41 
41 
41 
41 


6561 
25921 
103041 
410881 
1640961 


0.79 
1.57 
3.14 
6.28 
12.57 


-0.000213617359480 
0.160581838547352 
0.902057033817060 
-3.068320045777766 
-3.068320286282830 


-1.608e-01 
-7.415e-01 
3.970c+00 
2.405e-07 
(-1.897c-10) 


-833.919575889393 
171.937456004515 
1969.187023322940 
3307.497852217234 
3307.497764294187 


-1.006e+03 
-1.797e+03 
-1.338e+03 
8.792e-05 
(6.651c-8) 



Table 4. Results from a variable coefficient Helmholtz problem on a domain of size 
102 x 102 wave-lengths. 



6.4. Convection diffusion. We solved the equation 

(6.8) -Au(x) - 1000 [d 2 u](x) = x € ft, 

(6.9) u{x) = f(x) x G T, 

where ft = [0, l] 2 , where T = 9ft, and where the boundary data was given by 

f(x) = cos(xi) e X2 . 

The equation (16. 4p was discretized and solved as described in Section IBTTl Note that this case involves 
a non-oscillatory solution and does not fit the template (jl.ip . It is included to illustrate how the 
spectral method handles a sharp gradient in the solution. 



p 


N 


pts per wave 


u N (x) 


runt 
-£% 






6 


19602 


12.57 


8.969213152495405 


2.258e+00 


226.603823940515 


8.748e+01 


6 


77602 


25.13 


6.711091204119065 


1.317e-01 


139.118986915759 


4.949e+00 


6 


308802 


50.27 


6.579341284597024 


8.652e-03 


134.169908546083 


3.261c-01 


6 


1232002 


100.53 


6.570688999911585 




133.843774958376 




11 


19602 


12.57 


6.571117172871830 


9.613e-04 


133.865552472382 


3.851c-02 


11 


77602 


25.13 


6.570155895761215 


5.154e-05 


133.827043929015 


5.207e-03 


11 


308802 


50.27 


6.570104356719250 


1.987c-05 


133.821836691967 


2.052e-03 


11 


1232002 


100.53 


6.570084491282650 




133.819785089497 




21 


19602 


12.57 


6.570152809642857 


4.905e-05 


133.898328735897 


7.663e-02 


21 


77602 


25.13 


6.570103763348836 


1.951c-05 


133.821703687416 


1.943e-03 


21 


308802 


50.27 


6.570084254517955 


7.743e-06 


133.819760759394 


7.996c-04 


21 


1232002 


100.53 


6.570076511737839 




133.818961147570 





Table 5. Results from a constant coefficient Helmholtz problem on an L-shaped 
domain of size 12.7 x 12.7 wave-lengths. 



p 


N 


un(x) 




wn{v) 


TT^bncl 
N 


11 
11 
11 
11 


25921 
103041 
410881 
1640961 


1.987126286905920 
1.987445414657945 
1.987445414657547 
1.987445414655092 


-3.1916-04 
3.979e-13 
2.455c-12 


1255.25512379751 
1255.26231503666 
1255.26296795281 
1255.26298684450 


-7.1916-03 
-6.529c-04 
-1.889e-05 


21 
21 
21 
21 


25921 
103041 
410881 
1640961 


1.987076984861468 
1.987445414658047 
1.987445414658348 
1.987445414658608 


-3.684e-04 
-3.009e-13 
-2.600e-13 


1255.26075989546 
1255.26294637880 
1255.26298691798 
1255.26298699680 


-2.186e-03 
-4.054e-05 
-7.881c-08 


41 
41 
41 
41 


25921 
103041 
410881 
1640961 


1.988004762686629 
1.987445414657579 
1.987445414658550 
1.987445414659787 


5.593c-04 
-9.706c-13 
-1.237c-12 


1255.26290210213 
1255.26298687891 
1255.26298699669 
1255.26298699832 


-8.478c-05 
-1.178e-07 
-1.636e-09 


Table 6. Errors for the convection c 


iffusion problem ( 


6.8). 



The pointwise errors were estimated at the points 

x = (0.75, 0.25), and y = (0.75, 0.00), 

via 

Ef^ = un(x) — U4n(x), and E^ d = wn(x) — W4n(x). 

The results are given in Table [6l Table [7] reports results from an analogous experiment, but now 
with the strength of the convection term further increased by a factor of 10. 

We observe that the method has no difficulties resolving steep gradients, and that moderate order 
methods (p = 11) perform very well here. 

7. Extensions 

7.1. Linear complexity algorithms. In discussing the asymptotic complexity of the scheme in 
Section 15.11 it is important to distinguish between the case where ./V is increased for a fixed wave- 
number k, and the case where k ~ iV 5 to keep the number of degrees of freedom per wavelength 
constant. 

Let us first discuss the case where the wave-number k is kept fixed as N is increased. In this 
situation, the matrices U T will for the parent nodes become highly rank deficient (to finite precision). 
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p 


N 


u N {x) 


TT'int 
-£% 




771 bnd 


21 


25921 


1.476688750775769 


-4.700e-01 


13002.9937044202 


4.325e+02 


21 


103041 


1.946729131937971 


-4.206e-02 


12570.4750256324 


-7.862e-03 


21 


410881 


1.988785675941193 


-1.716e-06 


12570.4828877374 


-4.900e-03 


21 


1640961 


1.988787391699051 


(6.719c-13) 


12570.4877875310 


(-4.411e-04) 


41 


25921 


2.587008191566030 


6.407c-01 


13002.1084152522 


4.316c+02 


41 


103041 


1.946284950165041 


-4.250e-02 


12570.4835546978 


-2.618e-03 


41 


410881 


1.988785277235741 


-2.114e-06 


12570.4861729647 


-2.127c-03 


41 


1640961 


1.988787391699218 




12570.4882994934 





Table 7. Errors for a convection diffusion problem similar to (|6.8 
even more convection dominated operator A = —A — 10 000<9 2 . 



but now for the 



By factoring these matrices in the pre-computation, the solve phase can be reduced to 0{N) com- 
plexity. Moreover, the off-diagonal blocks of the matrices V r and W r will also be of low numerical 
rank; technically, they can efficiently be represented in data sparse formats such as the %-matrix 
format [6], or the Hierarchically Semi- Separable-format of [5J [13] . This property can be exploited to 
reduce the complexity of the pre-computation from 0(N 1,5 ) to O(N) in a manner similar to what is 
done for classical nested dissection for finite-element and finite-difference matrices in [H E \W\ [12] . 
Note that while the acceleration of the solve phase is trivial, it takes some work to exploit the more 
complicated structure in V r and W T . 

The case where k ~ iV 0,5 as N increases is more complicated. In this situation, the numerical 
ranks of the matrices U T and the off-diagonal blocks of V T and W T will be large enough that no 
reduction in asymptotic complexity can be expected. However, the matrices are in practice far from 
full rank, and substantial savings can be achieved by exploiting techniques such as those described 
in the previous paragraph. 

7.2. General elliptic problems. The algorithm in Section [57T1 can straight-forwardly be generalized 
to elliptic equations like 

- c n {x)[d 2 u](x) - 2ci 2 (x)[d 1 d 2 u](x) - c 22 (x)[d 2 u](x) 

+ ci(x)[diu](x) + c 2 (x)[d 2 u\(x) + c(x) u(x) = 

as long as the coefficients are smooth and the leading order operator is elliptic. The only modification 
required is that in the leaf computation, the matrix representing a spectral approximation of the 
differential operator be replaced by something like 

A r = -C n D 2 - 2C 12 DE - C 22 E 2 + dD + C 2 E + C, 

where Cn is a diagonal matrix with entries cu(xj) for j € I T , etc. Observe that only the leaf 
computation needs to be modified, the merge steps remain exactly the same. The stability and 
accuracy of the method of course depends on the signs and relative magnitudes of the coefficient 
terms, but tentative numerical experiments indicate that the method is effective for a broad range 
of different problems, including convection-dominated convection-diffusion equations. 

7.3. Free space scattering problems in the plane. If you know the DtN operator for the in- 
homogeneous square, you can very rapidly solve an exterior scattering problem as follows. Consider 
the equation 

-Au(sb) - k 2 (1 - b{x)) u(x) = f(x), x e M 2 . 

Appropriate radiation conditions at infinity are imposed on u. We assume that b is compactly 
supported inside the domain and that / is supported outside ft. The standard technique is to 
look for a solution u of the form 

U = V + w, 
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where v is an incoming field and w is the outgoing field. The incoming field is defined by 

(7.1) v{x) = [<j> K *f){x)=f <f> K (x-y)f(y)dy, 

where <p K is the fundamental solution to the free space Helmholtz problem. Then — Av — k 2 v = f, 
and since b{x) = for x S fi°, we find that the outgoing potential w must satisfy 

(7.2) - Aw(x) - k 2 w(x) = 0, xe tt c . 

Now use the method in Section [5.11 to construct the DtN map T for the problem 

— Au(x) — k 2 u(x) + K 2 b(x) u(x) = 0, x G VI. 

Then we know that 

(7.3) v n \ r + w n \ r = T (v\ r + w\ r ) , 

where v n and w n are normal derivatives of v and w, respectively. Now use BIE methods to construct 
the DtN map S for the problem (|7.2p on the exterior domain fi c . Then w must satisfy (|7.2p . 

(7.4) w n \r = Sw\r- 
Combining (j7.3|) and (j7.4|) we find 

u n |r + S w\r = Tv\r + Tw\r- 

In other words, 

(7.5) (S - T) w\t = Tv\ r - v n \ r . 

Observing that both v\r and can be obtained from (|7.ip . and that S and T are now available, 
we see that w\r can be determined by solving (|7.5p . 

7.4. Problems in three dimensions. There is no conceptual difficulty in generalizing the method 
to problems in M 3 . However, since the fraction of points located on interfaces will increase, the 
complexity of the pre-computation and the solve stages will be 0(N 2 ) and 0(_/V 4//3 ), respectively. 
The acceleration techniques described in Section \7. II will likely become very valuable in constructing 
highly efficient implementations in 3D. 

7.5. Formulating a scheme impervious to resonances. As discussed in Section I5.3| the fact 
that the proposed scheme relies on Dirichlet-to-Neumann maps causes problems when the hierarchical 
partitioning of the domain involves a sub-domain that admits resonant modes. While this issue can 
be managed quite easily, it would clearly be preferable to formulate a variation of the scheme that 
is inherently not vulnerable in this regard. One possible approach would be use a so called "total 
wave" approach suggested by Yu Chen of New York University (private communication). The idea 
is to maintain for each leaf not an operator that maps Dirichlet data to Neumann data, but rather 
a collection of matching pairs of Dirichlet and Neumann data (represented as vectors of tabulated 
values on the boundary). If you know the collection of pairs for two adjacent boxes, you can construct 
the collection for the union box via a merge procedure similar to the one used in Section HI This 
approach appears to not suffer from any problems in the case where one of the involved boxes admits 
resonant modes, but has the drawback that it would be less amenable to the acceleration technique 
described in Section 17.11 
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8. Conclusions 

The paper describes a composite spectral scheme for solving variable coefficient elliptic PDEs with 
smooth coefficients on simple domains such as squares and rectangles. The method involves a direct 
solver and can in a single sweep solve problems for which state-of-the-art iterative methods require 
thousands of iterations. High order spectral approximations are used. As a result, potential fields 
can be computed to a relative precision of about 10~ 10 using twelve points per wave-length or less. 

Numerical experiments indicate that the method is very fast. For a problem involving 1.6M degrees 
of freedom discretizing a domain of size 100 x 100 wavelengths, the pre-computation stage of the 
direct solver took less than 2 minutes on a laptop. Once the solution operator had been computed, 
the actual solve that given a vector of Dirichlet data on the boundary constructs the solution at 
all 1.6M internal tabulation points required only 0.3 seconds. The computed solution had a relative 
accuracy of 10 -9 . 

The asymptotic complexity of the method presented is 0(N 1,5 ) for the construction of the solution 
operator, and 0(N log N) for a solve once the solution operator has been created. For a situation 
where TV" is increased while the wave-number is kept fixed, it appears possible to improve the asymp- 
totic complexity to O(N) for both the pre-computation and the solve stages (see Section ETJ , but 
such a code has not yet been written. 

The method presented has a short-coming in that it is in principle vulnerable to resonances. It 
relies on a hierarchical partitioning of the domain, and if any one of the boxes in this partitioning 
is resonant, the method breaks down. In practice, this problem appears to happen very rarely, and 
can be both detected and remedied if it does occur. 
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Appendix A. Graphical illustration of the hierarchical merge process 

This section provides an illustrated overview of the hierarchical merge process described in detail 
in Section \5. II and in Figure El The figures illustrate a situation in which a square domain Q = [0, l] 2 
is split into 4x4 leaf boxes on the finest level, and an 8 x 8 spectral grid is used in each leaf. 

Step 1: Partition the box O into 16 small boxes that each holds an 8 x 8 Cartesian mesh of Chebyshev 
nodes. For each box, identify the internal nodes (marked in white) and eliminate them as described 
in Section [3j Construct the solution operator U , and the DtN operators encoded in the matrices V 
and W. 




Step 2: Merge the small boxes by pairs as described in Section [H The equilibrium equation for each 
rectangle is formed using the DtN operators of the two small squares it is made up of. The result is 
to eliminate the interior nodes (marked in white) of the newly formed larger boxes. Construct the 
solution operator U and the DtN matrices V and W for the new boxes. 




Step 3: Merge the boxes created in Step 2 in pairs, again via the process described in Section |U 




Step 5: Repeat the merge process one final time to obtain the DtN operator for the boundary of 
the whole domain. 



